Predicting protein secondary structure and solvent accessibility with an improved multiple linear regression method.
نویسندگان
چکیده
We have improved the multiple linear regression (MLR) algorithm for protein secondary structure prediction by combining it with the evolutionary information provided by multiple sequence alignment of PSI-BLAST. On the CB513 dataset, the three states average overall per-residue accuracy, Q(3), reached 76.4%, while segment overlap accuracy, SOV99, reached 73.2%, using a rigorous jackknife procedure and the strictest reduction of eight states DSSP definition to three states. This represents an improvement of approximately 5% on overall per-residue accuracy compared with previous work. The relative solvent accessibility prediction also benefited from this combination of methods. The system achieved 77.7% average jackknifed accuracy for two states prediction based on a 25% relative solvent accessibility mode, with a Mathews' correlation coefficient of 0.548. The improved MLR secondary structure and relative solvent accessibility prediction server is available at http://spg.biosci.tsinghua.edu.cn/.
منابع مشابه
Classification Comparison of Prediction of Solvent Accessibility From Protein Sequences
The prediction of residue solvent accessibility from protein sequences has been studied by various methods. The direct comparison of these methods is impossible due to the variety of datasets used and the difference in structure definition. In this paper we choose 5 classification approaches (decision tree (DT), Support Vector Machine (SVM), Bayesian Statistics (BS) , Neural Network (NN) and Mu...
متن کاملReal value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure.
The present study is an attempt to develop a neural network-based method for predicting the real value of solvent accessibility from the sequence using evolutionary information in the form of multiple sequence alignment. In this method, two feed-forward networks with a single hidden layer have been trained with standard back-propagation as a learning algorithm. The Pearson's correlation coeffic...
متن کاملCa Bios Invited Review
The problem of predicting protein structure from the sequence remains fundamentally unsolved despite more than three decades of intensive research effort. However, new and promising methods in three-dimensional (3D), 2D and ID prediction have reopened the field. Mean-forcepotentials derived from the protein databases can distinguish between correct and incorrect models (3D). Inter-residue conta...
متن کاملSequence based prediction of relative solvent accessibility using two-stage support vector regression with confidence values
sequences and the number of known structures. Predicted relative solvent accessibility (RSA) Despite several decades of extensive research in terprovides useful information for prediction of tiary structure prediction, this task is still a big chalbinding sites and reconstruction of the 3Dlenge, especially for sequences that do not have a sigstructure based on a protein sequence. nificant seque...
متن کاملPrediction of structural features and application to outer membrane protein identification
Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the ass...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proteins
دوره 61 3 شماره
صفحات -
تاریخ انتشار 2005